Gang Wu, Tanya Leise, and John Hogenesch
2016-05-17
The MetaCycle package is mainly used for detecting rhythmic signals from large scale time-series data. It incorporates ARSER(ARS), JTK_CYCLE(JTK), and Lomb-Scargle(LS) properly for periodic signal detection, and could also output integrated analysis results if required.
The usual time-series data is evenly sampled once at each time point, and the interval between neighbour time points is integer. Not all data are as simple as this. There are datasets with replicate samples, or with missing values, or un-evenly sampled, or sampled with a non-integer interval. Examples of these types of data are shown in the below table.
| Data Type | Point 1 | Point 2 | Point 3 | Point 4 | Point 5 | Point 6 |
|---|---|---|---|---|---|---|
| The usual data | CT0 | CT4 | CT8 | CT12 | CT16 | CT20 |
| With missing value | CT0 | NA | CT8 | CT12 | CT16 | CT20 |
| With replicates | CT0 | CT0 | CT8 | CT8 | CT16 | CT16 |
| With un-even interval | CT0 | CT2 | CT8 | CT10 | CT16 | CT20 |
| With non-integer interval | CT0 | CT4.5 | CT9 | CT13.5 | CT18 | CT22.5 |
Of course, some datasets may seem combination of two or more of above types of data.
| Data Type | Point 1 | Point 2 | Point 3 | Point 4 | Point 5 | Point 6 |
|---|---|---|---|---|---|---|
| With replicates and missing value | CT0 | CT0 | CT8 | NA | CT16 | CT16 |
| With un-even interval and replicates | CT0 | CT2 | CT2 | CT10 | CT16 | CT20 |
The meta2d function in MetaCycle is designed to analyze differnt types of time-series datasets, and it could automatically select proper method to analyze different types of input datasets. The implementation strategy used for meta2d is shown in the flow chart.
In addition to selecting proper methods to analyze different kinds of datasets, meta2d could also output integrated results from multiple methods. Detail explaination about integration stragegies is in the vignettes of MetaCycle.
\[Y_i = B + TRE*(t_i - \frac{\sum_{i=1}^n t_i}{n}) + A*cos(2*\pi*\frac{t_i - PHA}{PER})\]
- B is baseline level of the time-series profile
- TRE is trend level of the time-series profile
- A is the amplitude of the waveform
- PHA and PER are integrated period and phase mentioned above.
- The baseline and trend level are explained in the below example.
| Column name | Description | Column name | Description |
|---|---|---|---|
| ARS_pvalue | pvalue from ARS | LS_BH.Q | FDR from LS |
| ARS_BH.Q | FDR from ARS | LS_period | period from LS |
| ARS_period | period from ARS | LS_adjphase | adjusted phase from LS |
| ARS_adjphase | adjusted phase from ARS | LS_amplitude | amplitude from LS |
| ARS_amplitude | amplitude from ARS | meta2d_pvalue | integrated pvalue |
| JTK_pvalue | pvalue from JTK | meta2d_BH.Q | FDR based on integrated pvalue |
| JTK_BH.Q | FDR from JTK | meta2d_period | averaged period |
| JTK_period | period from JTK | meta2d_phase | integrated phase |
| JTK_adjphase | adjusted phase from JTK | meta2d_Base | baseline value given by meta2d |
| JTK_amplitude | amplitude from JTK | meta2d_AMP | amplitude given by meta2d |
| LS_pvalue | pvalue from LS | meta2d_rAMP | relative amplitude |
# load 'shiny' package
library("shiny")
# load 'MetaCycle' package
library("MetaCycle")
# load 'dplyr' package
library("dplyr")
# load 'ggplot2' package
library("ggplot2")
# load 'cowplot' package
library("cowplot")# change working directory to the desktop
setwd("~/Desktop")
# check required directories under the desktop
dir()# load shiny package
library("shiny")
# use 'runGitHub' function of shiny package
runGitHub("MetaCycleApp", "gangwug")# load shiny package
library("shiny")
# use 'runApp' function of shiny package
runApp("~/Desktop/MetaCycleApp-master")# load dplyr package
library("dplyr")
# read in the 'meta2d_experientA.csv' file
dataD <- read.csv("~/Desktop/SRBR_SMTSAworkshop-master/result/meta2d_experimentA.csv",
stringsAsFactors = FALSE)
# take a look at column names of the data
colnames(dataD)## [1] "CycID" "JTK_pvalue" "JTK_BH.Q" "JTK_period"
## [5] "JTK_adjphase" "JTK_amplitude" "LS_pvalue" "LS_BH.Q"
## [9] "LS_period" "LS_adjphase" "LS_amplitude" "meta2d_pvalue"
## [13] "meta2d_BH.Q" "meta2d_period" "meta2d_phase" "meta2d_Base"
## [17] "meta2d_AMP" "meta2d_rAMP" "X18" "X20"
## [21] "X22" "X24" "X26" "X28"
## [25] "X30" "X32" "X34" "X36"
## [29] "X38" "X40" "X42" "X44"
## [33] "X46" "X48" "X50" "X52"
## [37] "X54" "X56" "X58" "X60"
## [41] "X62" "X64"
# filter the data by meta2d_BH.Q < 0.05 by 'filter' function of dplyr
cirD <- filter(dataD, meta2d_BH.Q < 0.05)
# see the number of circadian transcripts
nrow(cirD)## [1] 706
# select "CycID", "meta2d_period", "meta2d_phase", and "X18" to "X64" columns
# for drawing heatmap by 'select' function of dplyr package
figureD <- select(cirD, CycID, meta2d_period, meta2d_phase,
num_range("X", seq(18, 64, by=2), width = 2))# load 'heatmapF' and 'phaseHist' function used to draw heatmap and histogram
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")
# get the heatmap figure by 'heatmapF' function
heatmapFigure <- heatmapF(inputD = figureD, minfold=0.5, maxfold=2)
heatmapFigure# get the phase distribtuion figure by 'phaseHist' function
histFigure <- phaseHist(inputD = figureD, binvalue=seq(0,24,by=2), histcol = "blue")
histFigure# take a look at column names of 'cirD' dataframe
colnames(cirD)## [1] "CycID" "JTK_pvalue" "JTK_BH.Q" "JTK_period"
## [5] "JTK_adjphase" "JTK_amplitude" "LS_pvalue" "LS_BH.Q"
## [9] "LS_period" "LS_adjphase" "LS_amplitude" "meta2d_pvalue"
## [13] "meta2d_BH.Q" "meta2d_period" "meta2d_phase" "meta2d_Base"
## [17] "meta2d_AMP" "meta2d_rAMP" "X18" "X20"
## [21] "X22" "X24" "X26" "X28"
## [25] "X30" "X32" "X34" "X36"
## [29] "X38" "X40" "X42" "X44"
## [33] "X46" "X48" "X50" "X52"
## [37] "X54" "X56" "X58" "X60"
## [41] "X62" "X64"
# if it reports error after typing above command, please re-run the code of preparing
# 'figureD' in the first part of 3.5 section of this demo# select its 'CycID', "meta2d_BH.Q", 'meta2d_phase' columns for later analysis
phaseD <- select(cirD, CycID, meta2d_BH.Q, meta2d_phase)
# read in the annotation file-'annoDbiMainInfMouse4302.txt' in the 'data-raw' directory
annoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/Mouse4302ProbeAnnotation.txt",
stringsAsFactors = FALSE)
# add annotation information with 'inner_join' function of dplyr package
phaseD <- inner_join(phaseD, annoD, by=c("CycID" = "PROBEID"))
# take a look at the phaseD
head(phaseD)## CycID meta2d_BH.Q meta2d_phase ENTREZID SYMBOL
## 1 1416273_at 6.267205e-04 0.9112647 21928 Tnfaip2
## 2 1418853_at 1.005625e-04 2.2190313 28194 Apon
## 3 1418322_at 1.267709e-03 21.7744648 12916 Crem
## 4 1435748_at 5.230269e-08 16.5923801 14544 Gda
## 5 1448993_at 3.079038e-02 16.0442955 67841 Atg3
## 6 1428889_at 4.627240e-02 4.8168048 69113 Alkbh3
# filter out those probesets without annotation information
phaseD <- filter(phaseD, SYMBOL != "NA")
# select 'SYMBOL', 'meta2d_BH.Q' and 'meta2d_phase' columns for
# getting a dataframe without duplicate gene names
ori_phaseD <- select(phaseD, SYMBOL, meta2d_BH.Q, meta2d_phase)
# take a look at the row number with possilbe duplicate gene names
dim(ori_phaseD)## [1] 698 3
# load the 'uniF' function from 'fig.R' for doing this work
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")
# get a dataframe without duplicate gene names
uni_phaseD <- uniF(ori_phaseD)
# take a look at the rownumber now
dim(uni_phaseD)## [1] 643 3
# select 'SYMBOL' and 'meta2d_phase' columns for PSEA analysis
pseaD <- select(uni_phaseD, SYMBOL, meta2d_phase)
# take a look at the pseaD
head(pseaD)## SYMBOL meta2d_phase
## 1 1110059E24Rik 15.0817948
## 2 1190002N15Rik 15.7701032
## 3 1700023H06Rik 6.0991388
## 4 1810055G02Rik 11.2992141
## 5 2310035C23Rik 1.6562627
## 6 2610507B11Rik 0.8431068
# write the 'pseaD' dataframe to a txt file-'experimentPSEA.txt' in 'result' directory
write.table(pseaD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt",
sep = "\t", quote = FALSE, row.names = FALSE)# read in the 'experimentPSEA.txt' file
ori_pseaD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt",
stringsAsFactors = FALSE)
# take a look at the data
head(ori_pseaD)## SYMBOL meta2d_phase
## 1 1110059E24Rik 15.0817948
## 2 1190002N15Rik 15.7701032
## 3 1700023H06Rik 6.0991388
## 4 1810055G02Rik 11.2992141
## 5 2310035C23Rik 1.6562627
## 6 2610507B11Rik 0.8431068
# read in the 'MouseHumanHomolog.txt' file in 'data-raw' directory of SRBR_SMTSAworkshop-master
homoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/MouseHumanHomolog.txt",
stringsAsFactors = FALSE)
# take a look at the data
head(homoD)## Mouse_gene Human_gene
## 1 Acadm ACADM
## 2 Acadvl ACADVL
## 3 Acat1 ACAT1
## 4 Acvr1 ACVR1
## 5 Sgca SGCA
## 6 Adsl ADSL
# join mouse gene and human homolog gene with 'inner_join' function
homo_pseaD <- inner_join(ori_pseaD, homoD, by=c("SYMBOL" = "Mouse_gene"))
# take a look at the joined dataframe
head(homo_pseaD)## SYMBOL meta2d_phase Human_gene
## 1 1110059E24Rik 15.0817948 C9orf85
## 2 1190002N15Rik 15.7701032 C3orf58
## 3 1810055G02Rik 11.2992141 C11orf24
## 4 2310035C23Rik 1.6562627 KIAA1468
## 5 2610507B11Rik 0.8431068 KIAA0100
## 6 2700094K13Rik 9.9926117 C11orf31
# select "Human_gene" and "meta2d_phase" columns for PSEA analysis
outD <- select(homo_pseaD, Human_gene, meta2d_phase)
# take a look at the selected data
head(outD)## Human_gene meta2d_phase
## 1 C9orf85 15.0817948
## 2 C3orf58 15.7701032
## 3 C11orf24 11.2992141
## 4 KIAA1468 1.6562627
## 5 KIAA0100 0.8431068
## 6 C11orf31 9.9926117
# write it to a file named-'human_experimentPSEA.txt' in 'result' directory of SRBR_SMTSAworkshop-master
write.table(outD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/human_experimentPSEA.txt",
sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)